07. Quiz: Test Your Intuition

Quiz: Test Your Intuition

## Playing Chess

Say you are an agent, and your goal is to play chess. At every time step, you choose any action from the set of possible moves in the game. Your opponent is part of the environment; she responds with her own move, and the state you receive at the next time step is the configuration of the board, when it’s your turn to choose a move again. The reward is only delivered at the end of the game, and, let’s say, is +1 if you win, and -1 if you lose.

This is an episodic task, where an episode finishes when the game ends. The idea is that by playing the game many times, or by interacting with the environment in many episodes, you can learn to play chess better and better.

It's important to note that this problem is exceptionally difficult, because the feedback is only delivered at the very end of the game. So, if you lose a game (and get a reward of -1 at the end of the episode), it’s unclear when exactly you went wrong: maybe you were so bad at playing that every move was horrible, or maybe instead … you played beautifully for the majority of the game, and then made only a small mistake at the end.

When the reward signal is largely uninformative in this way, we say that the task suffers the problem of sparse rewards. There’s an entire area of research dedicated to this problem, and you’re encouraged to read more about it, if it interests you.

Playing Chess

In chess, what's an example of an action that the agent could take?

SOLUTION: Moving a piece

What's an example of a state in the game?

SOLUTION:
  • The configuration of the board

Say you just started playing chess against your opponent, and it seems to be going great - you have played 20 moves and already taken five pieces from your opponent. The game hasn't ended yet, so you're not 100% sure you'll win, but it seems likely. What cumulative reward have you received so far?

SOLUTION: 0

## Escaping a Maze

Consider a game in which the agent is located in a maze and trying to find the quickest route to the goal. If all the agent can do is randomly explore the maze, it will not be able to learn anything until it reaches the goal at least once.

Navigating a Maze

In the hedge maze, what's an example of an action that the agent could take?

SOLUTION: Moving north in the maze